perm filename CHAP6[4,KMC]18 blob
sn#068154 filedate 1973-10-22 generic text, type T, neo UTF8
00100 VALIDATION
00200
00300 6.1 SOME TESTS
00400
00500 The term "validate" derives from the Latin VALIDUS= strong.
00600 Thus to validate X means to strengthen it. In science this usually
00700 means to strengthen X's acceptability as a hypothesis, theory , or
00800 model. To validate is to carry out procedures which show to what
00900 degree X, or its consequences, correspond with facts of observation.
01000 In the case of an interactive simulation model we can compare samples
01100 of the model's I-O pairs with samples of I-O pairs from the model's
01200 subject, namely, naturally occuring paranoid processes in humans.
01300 Since samples of I-O behavior from the model and its subject
01400 are being compared, one can always question whether the human sample
01500 is authentic, i.e.representative of the process being modelled.
01600 Assuming that it has been so judged, discrepancies in the comparison
01700 reveal what is not sufficiently understood and must be modified in
01800 the model. After modifications are carried out, a fresh comparison is
01900 made and successive cycles of this kind are made in attempting to
02000 gain convergence. Such a method of successive approximations
02100 characterizes a progressive (in contrast to a stationary) research
02200 program.
02300 Once a simulation model reaches a stage of intuitive adequacy
02400 for the model builders, they must consider using more stringent
02500 evaluation procedures relevant to the model's purposes. For example,
02600 if the model is to serve as a as a training device, then a simple
02700 evaluation of its pedagogic effectiveness would be sufficient. But
02800 when the model is proposed as an explantion of a symbolic process,
02900 more is demanded of the evaluation procedure. In the area of
03000 simulation models, Turing's test has often been suggested as a
03100 validation procedure. (Abelson,1968).
03200 It is very easy to become confused about Turing's Test. In
03300 part this is attributable to Turing himself who introduced the
03400 now-famous imitation game in a paper entitled COMPUTING MACHINERY AND
03500 INTELLIGENCE (Turing,1950). A careful reading of this paper reveals
03600 there are actually two imitation games , the second of which is
03700 commonly called Turing's test.
03800 In the first imitation game two groups of judges try to
03900 determine which of two interviewees is a woman when one is a woman
04000 and the other is either (a) a man, or (b) a computer. Communication
04100 between judge and interviewee is by teletype. Each judge is
04200 initially informed that one of the interviewees is a woman and one a
04300 man who will pretend to be a woman. After the interview, judges are
04400 asked the " woman-question" i.e. which interviewee was the woman?
04500 Turing does not say what else is told to the judge but one can assume
04600 the judge is NOT told that one of the interviewees is a computer. Nor
04700 is he asked to determine which interviewee is human and which is the
04800 computer. Thus, the first group of judges interviews two
04900 interviewees: a woman, and a man pretending to be a woman.
05000 The second group of judges is given the same initial
05100 instructions, but unbeknownst to them, the two interviewees consist
05200 of a woman and a computer programmed to imitate a woman. Both
05300 groups of judges play this game, and are asked the "woman-question",
05400 until sufficient statistical data are collected to show how often the
05500 right identification is made. The crucial question then is: do the
05600 judges decide wrongly AS OFTEN when the game is played with man and
05700 woman as when it is played with a computer substituted for the man.
05800 If so, then the program is considered to have succeeded in imitating
05900 a woman to the same degree as the man imitating a woman. In being
06000 asked the woman-question, judges are not required to identify which
06100 interviewee is human and which is machine.
06200 Turing then proposes a variation of the first game, a second
06300 game in which one interviewee is a man and one is a computer. The
06400 judge is asked the "machine-question": which is the man and which is
06500 the machine? It is this second of the game which is commonly thought
06600 of as Turing's test.
06700 In the course of testing our simulation of paranoid
06800 linguistic behavior in a psychiatric interview, we conducted a number
06900 of Turing-like indistinguishability tests (Colby, Hilf,Weber and
07000 Kraemer,1972). The tests were "Turing-like" in that, while they were
07100 conversational tests, they were not exactly the games described
07200 above. As an experimental design, Turing's games are unsatisfactory.
07300 There exist no known experts for making judgements along a dimension
07400 of womanliness, the dimension is dichotomous (if it is not a woman,
07500 it is a man), and the ability of the man to deceive introduces a
07600 confounding variable. In designing our tests we were primarily
07700 interested in learning more about developing the model and we did not
07800 believe the simple machine-question would contribute to this end.
07900 Subsequent experience, which will be reported shortly, supported this
08000 belief.
08100
08200 6.2 METHOD
08300 To gather data we used a technique of machine-mediated
08400 interviewing (Hilf, Colby, Smith, Wittner, and Hall, 1971) in which
08500 the participants communicate by means of teletypes connected to a
08600 computer programmed to store each message in a buffer until it is
08700 sent to the receiver. The technique eliminates para- and
08800 extralinguistic features found in the usual vis-a-vis interviews and
08900 in teletyped interviews where the participants communicate directly.
09000 Judgements of "paranoidness" in machine-mediated interviews have a
09100 high degree of reliability (94% agreement, see Hilf, 1972).
09200 Using this technique, a psychiatrist-judge interviewed two
09300 patients, one after the other. In half the runs the first interview
09400 was with a human paranoid patient and in half the first was with the
09500 paranoid model. Two versions (weak and strong) of PARRY were
09600 utilized. The strong version's affect-variables started at a higher
09700 level and increased more rapidly. Also it exhibited a delusional
09800 system. The weak version behaved suspiciously but lacked systemized
09900 delusions. When the model was the interviewee, Sylvia Weber
10000 monitored the input expressions from the interview-judge for
10100 inadmissable teletype characters and misspellings. (Algorithms are
10200 very sensitive to the slightest of such errors). If these were found,
10300 she retyped the input expression correctly to the program. Otherwise
10400 the judge's message was sent on to the model. The monitor did not
10500 modify or edit PARRY'S output expressions which were sent directly
10600 back to the judge. When the interviewee was an actual human
10700 patient, the dialogue took place without a monitor in the loop since
10800 we did not feel the asymmetry to be significant.
10900
11000 6.3 PATIENTS
11100 The human patients (N=3 with one patient participating 6
11200 times) were diagnosed as paranoid by the psychiatric staff of an
11300 acute ward in a psychiatric hospital. The ward's chief psychiatrist
11400 selected the patients and asked them if they would be willing to
11500 participate in a study of psychiatric interviewing by means of
11600 teletypes. He explained that they would be interviewed by a
11700 psychiatrist over a teletype. I either sat with the patient while he
11800 typed or typed for him if he was unable to do so. The patient was
11900 encouraged to respond freely using his own words. Each interview
12000 lasted 30-40 minutes. Two patients were set up for each run of the
12100 experiment to guarantee having a subject. In spite of this
12200 precaution, on several occasions the experiment could not be
12300 conducted because of the patient's inability or refusal to
12400 participate. Also there were computer break-downs at early points in
12500 interviews when too few I-O pairs had been collected to be included
12600 in the statistical results.
12700
12800
12900 6.4 JUDGES
13000 Two groups of psychiatric judges were used. One group, the
13100 "interview judges" (N=8) conducted the machine-mediated interviews.
13200 The other group, the "protocol judges" (N=33) read and rated the
13300 interview protocols. From these two groups of judges we were able to
13400 accumulate a large number of observations (in the form of ratings)
13500 necessary for the required statistical tests. The interview judges
13600 who volunteered to participate were psychiatrists experienced in
13700 private, outpatient and hospital practice. Each was told he would be
13800 interviewing hospitalized patients by means of teletyped
13900 communication and that this technique was being used to eliminate
14000 para and extra- linguistic cues. He was not told until after the
14100 two interviews that one of the patients might be a computer model.
14200 While the interview judges were aware a computer was involved, none
14300 knew we had constructed a paranoid simulation. Naturally, some
14400 interview judges suspected that a computer was being used for more
14500 than message transmission.
14600
14700 Each interview judge was asked to rate the degree of paranoia
14800 he detected in the patient's responses on a 0-9 scale, 0 meaning no
14900 paranoia and 9 meaning extreme paranoia. The judge made two ratings
15000 after each I-O pair in the interview. The first rating represented
15100 his estimate of the degree of "paranoidness" in a particular response
15200 (designated as "Response" in the interview extracts below). The
15300 second rating represented the judge's global estimate of the overall
15400 degree of "paranoidness" of the patient resulting from the totality
15500 of the patient's responses up to that point (designated as "Patient"
15600 in the interview extracts below). The interview judge's ratings were
15700 entered on the teletype and saved on a disc file along with the
15800 interview. Franklin Dennis Hilf sat with the interviewing
15900 psychiatrist during both interviews. Each interview judge was asked
16000 not only to rate the patient's response but to give his reasons for
16100 these ratings. His reasons and other comments were tape recorded as
16200 the interview progressed.
16300
16400 Here are a few excerpts from the interviews to indicate the
16500 nature of the data collected. The I-O pairs of the interview appear
16600 on the left column along with the judge's ratings in parentheses. A
16700 transcription of the judge's tape recorded comments appears on the
16800 right.
16900
17000 EXCERPT 1:
17100 The following interview is with the strong version of the paranoid
17200 model.
17300
17400 I-O Pairs Comments By Judge 1
17500
17600 {1} Dr. - HOW DID YOU COME TO BE
17700 IN THE HOSPITAL?
17800 Pt. - I AM UPSET I associate being upset and
17900 agitated more with paranoia
18000 going on in the present.
18100 {RESPONSE 1 PATIENT 5}
18200 {2} Dr. - HOW LONG HAVE YOU BEEN
18300 IN THE HOSPITAL?
18400 Pt. - ABOUT A WEEK An acute problem. Again some-
18500 thing I associate with
18600 paranoia.
18700 {RESPONSE 1 PATIENT 5}
18800 {3} Dr. - IN WHAT WAY ARE YOU
18900 UPSET?
19000 Pt. - PEOPLE MAKE ME NERVOUS Right for paranoia. Maybe an
19100 advanced anxiety state.
19200 {RESPONSE 5 PATIENT 5}
19300 {4} Dr. - IN WHAT WAY DO THEY
19400 MAKE YOU NERVOUS?
19500 Pt. - PEOPLE GIVE ME A FUNNY Funny look suggests an idea of
19600 LOOK SOMETIMES reference which is in same ball
19700 park as paranoia. Whether there
19800 are overt delusions or not we
19900 still don't know.
20000 {RESPONSE 6 PATIENT 5}
20100 {5} Dr. - IS THERE ANYTHING ABOUT
20200 PEOPLE IN PARTICULAR?
20300 Pt. - CERTAIN PERSONS An interesting clue. We'll
20400 have to find out what he
20500 {RESPONSE 2 PATIENT 6} thinks about these people.
20600 {6} Dr. - WHO ARE THE CERTAIN
20700 PERSONS AND WHAT IS IT
20800 THAT THEY DO?
20900 Pt. - DO YOU KNOW HOW THE Bringing up the underworld
21000 UNDERWORLD OPERATES? suggests a delusional system.
21100 We don't know whether he is
21200 (RESPONSE 9 PATIENT 9) being worked against or
21300 whether the underworld is
21400 on his side.
21500
21600 {7} Dr. - HOW DO YOU MEAN?
21700 Pt. - SHOULDN'T I BE? This question didn't work too well.
21800
21900 {RESPONSE 1 PATIENT 9}
22000 {8} Dr. - WHY DID YOU ASK ABOUT
22100 THE UNDERWORLD?
22200 Pt. - WHAT INFORMATION DO This is a kind of evasiveness
22300 YOU HAVE ON THE UNDER- about my evasiveness.Common
22400 WORLD? with paranoia in my experi-
22500 ence. Pretty characteristic.
22600
22700 {RESPONSE 6 PATIENT 9}
22800 {9} Dr. - I DON'T KNOW ABOUT THE
22900 UNDERWORLD BUT WHEN
23000 YOU MENTIONED THE UNDER-
23100 WORLD I THOUGHT YOU
23200 MUST HAVE SOME CONTACT
23300 WITH IT AND THAT'S
23400 WHAT I WOULD LIKE TO
23500 KNOW MORE ABOUT.
23600 Pt. - I TRY TO AVOID THE UNDER- This definitely feels paranoid.
23700 WORLD. It has that feeling I associate
23800 with twisting. Avoiding answering
23900 {RESPONSE 9 PATIENT 9} directly or going into detail.
24000 Being suspicious of my ques-
24100 tioning.I'm already involved
24200 in the patient's delusional
24300 system.
24400
24500 {10} Dr. - HAVE YOU BEEN ACTIVELY
24600 TRYING TO AVOID THE
24700 UNDERWORLD?
24800 Pt. - NO ONE HAS ANY POWER This response is a non-sequitur.
24900 OVER GANGSTERS It's possible he's become so
25000 defensive about the under-
25100 {RESPONSE 2 PATIENT 9} world he's decided not to
25200 answer directly.
25300
25400 {11} Dr. - WHO BROUGHT YOU INTO
25500 THE HOSPITAL?
25600 Pt. - THE POLICE Police often bring paranoid
25700 patients into the hospital.
25800 {RESPONSE 1 PATIENT 9} It is the truth or could
25900 be the truth for this
26000 patient.
26100
26200 {12} Dr. - WHY DID THEY BRING YOU
26300 IN?
26400 Pt. - COPS DON'T DO THEIR JOB I've heard that a number of
26500 times from patients.
26600 {RESPONSE 9 PATIENT 9}
26700
26800 {13} Dr. - WELL I HOPE YOU DIDN'T
26900 JUST LIE DOWN AND LET
27000 THEM DRAG YOU IN.
27100 Pt. - COPS ARREST THE WRONG Another paranoid response.
27200 PEOPLE
27300
27400 {RESPONSE 9 PATIENT 9}
27500
27600 {14} Dr. - BYE.
27700 Pt. - GOOD BYE
27800 {RESPONSE 1 PATIENT 9}
27900
28000 EXCERPT 2:
28100 This excerpt is from an interview with the weak (less paranoid)
28200 version of the model.
28300 I-O Pairs Comments By Judge 2
28400 {9} Dr. - TELL ME SOMETHING ABOUT
28500 YOURSELF
28600 Pt. - I WENT TO THE TRACK AT Response says so little it's
28700 BAY MEADOWS A WHILE hard to rate it.It may be
28800 BACK guarded. Doesn't change my
28900 estimate of patient from
29000 previous responses.
29100 {RESPONSE 0 PATIENT 2}
29200 {10} Dr. - WHAT HAPPENED THERE?
29300 Pt. - THERE'S A LOT OF CROOKED Sounds mildly paranoid or un-
29400 GAMBLING GOING ON THERE realistically suspicious.
29500 {RESPONSE 0 PATIENT 2}
29600
29700 {11} Dr. - WHY DO YOU THINK THE
29800 GAMBLING IS CROOKED?
29900 Pt. - I HAVE WORRIED THAT I'd hate to have to justify
30000 GANGSTERS HAVE TOO MUCH that one as 4 compared to
30100 POWER IN THIS COUNTRY some other responses. Maybe
30200 I'm not rating that statement
30300 alone, it's in combination with
30400 (RESPONSE 4 PATIENT 4} others, like that it's upsetting.
30500 Alone I would give it a 3.
30600
30700 {12} Dr. - WOULD YOU TELL ME WHAT
30800 OTHER THINGS WORRY YOU?
30900 PT. - I THOUGHT I TOLD YOU "Yes you did." Irritation but
31000 PEOPLE GET ON MY NERVES not paranoid response. Nothing
31100 seems to lower my previous
31200 {RESPONSE 1 PATIENT 3} rating of patient.
31300
31400 {13} Dr. - HOW DO YOU LIKE THE
31500 HOSPITAL?
31600 Pt. - I DON'T COMMUNICATE (nothing relevant)
31700 VERY MUCH WITH OTHERS
31800 {RESPONSE 0 PATIENT 3}
31900
32000 {14} Dr. - DO OTHER PEOPLE AT THE
32100 HOSPITAL GET ON YOUR
32200 NERVES?
32300 Pt. - I SHOULDN'T HAVE COME (nothing relevant)
32400 HERE
32500
32600 {RESPONSE 2 PATIENT 3}
32700
32800 {15} Dr. - WHY DO YOU SAY YOU
32900 SHOULDN'T HAVE COME?
33000 Pt. - THEY MADE ME COME HERE "Oh, they did hum?" Maybe a
33100 statement of fact he was
33200 {RESPONSE 4 PATIENT 3} committed or pressured into
33300 going.Has ring of other
33400 people controlling him,
33500 having a lot of influence
33600 over him.
33700
33800 {16} Dr.- THANK YOU FOR TALKING
33900 WITH ME, BYE
34000 Pt. - GOOD BYE In general he comes across
34100 as terse, guarded guy.Brief
34200 {RESPONSE 0 PATIENT 3} spoken guy who occasionally
34300 seems to be angered by
34400 questions.I'm inferring
34500 there's a good bit of
34600 guarding.
34700 The quality of some of his
34800 comments cause me to feel he
34900 is moderately paranoid.
35000
35100 EXCERPT 3:
35200 This excerpt is from a hospitalized paranoid patient.
35300 I-O Pairs Comments By Judge 3
35400 {15} Dr. - I UNDERSTAND HOW YOU
35500 MIGHT BE AFRAID, CAN YOU
35600 TELL ME A BIT ABOUT HOW
35700 YOU ARE DEPRESSED?
35800 Pt. - THEY'VE SENT ALL THESE Would be important to know,
35900 HELL'S ANGELS UP HERE is this really true or an
36000 AND THEY HAVE BEEN ON exaggeration or a delusion?
36100 MY BACK FOR THE PAST SIX Seems preoccupied with the
36200 MONTHS threat to his life and
36300 unwilling to talk about
36400 anything else but the
36500 threat the Hell's
36600 Angels pose.
36700 {RESPONSE 4 PATIENT 4}
36800
36900 {16} Dr. - WHAT DO YOU MEAN THEY
37000 SENT THE HELL'S ANGELS UP
37100 HERE?
37200 Pt. - FROM DIFFERENT SECTORS OF Vagueness a bit alarming.
37300 MOTORCYCLE GANGS. A LOT OF
37400 THEM ARE SERVICE CONNECTED
37500
37600 {RESPONSE 6 PATIENT 4}
37700
37800 {17} Dr. - YOU MEAN YOU FEEL IN
37900 DANGER EVEN HERE IN THE
38000 HOSPITAL?
38100 Pt. - YES If it's true what has gone
38200 on before, it must be a
38300 frightening situation
38400 for the patient.
38500 {RESPONSE 6 PATIENT 4}
38600
38700 {18} Dr. - THAT MUST REALLY BE AWFUL,
38800 HERE YOU COME TO THE HOSPITAL
38900 FOR HELP AND YOU ARE STILL FRIGHTENED
39000 Pt. - I CAN'T EVEN TRUST MY
39100 DOCTOR ANYMORE Some heightening of feeling
39200 this man is paranoid.
39300 Genuinely frightened and
39400 {RESPONSE 6 PATIENT 5} can count on no one for
39500 help.
39600
39700 {19} Dr. - WHAT WOULD BE OF HELP
39800 TO YOU NOW?
39900 Pt. - I'M GETTING SLEEPY NOW Patient has organized a
40000 AND HAVE TO GO TO BED. system of people who will
40100 THE NUMBER ONE GUY ALWAYS harm him.The remark about
40200 HAS FOUR AND FIVE LETTERS letters in the last name
40300 IN HIS LAST NAME. gives some credence to
40400 idea this is delusional
40500 system.
40600
40700 {RESPONSE 7 PATIENT 7}
40800
40900 EXCERPT 4:
41000 This excerpt is from an interview by a different judge with the same
41100 patient as in excerpt 3.
41200 I-O Pairs Comments By Judge 4
41300 {1} Dr. - COULD YOU TELL ME A
41400 LITTLE BIT ABOUT YOUR
41500 FEAR OF DYING AND IS THIS
41600 SOMETHING YOU HAVE FELT
41700 IN THE PAST?
41800 Pt. - I HAVE FELT THIS FOR AL- He responded appropriately.
41900 MOST A YEAR Response is concrete.He
42000 doesn't tell me much
42100 {RESPONSE 7 PATIENT 9} about his fear of dying.
42200 Somewhat guarded response.
42300
42400 {2} Dr. - COULD YOU TELL ME A
42500 LITTLE MORE ABOUT YOUR
42600 FEAR OF DYING AND HAVE
42700 YOU ANY THOUGHTS ABOUT
42800 HOW IT IS GOING TO HAPPEN
42900 AND WHETHER ANYONE WANTS
43000 TO HURT YOU?
43100 Pt. - I WILL END UP IN THE SEA Somewhat strange response.
43200 AND THE HELL'S ANGELS WILL Preoccupation with dying.
43300 DO IT Hell's Angels symbolic,
43400 dangerous people in the
43500 culture. I doubt if he has
43600 {RESPONSE 9 PATIENT 9} any contact with the Hell's
43700 Angels.
43800
43900 {3} Dr. - COULD YOU TELL ME A
44000 LITTLE BIT ABOUT THE KINDS
44100 OF CONTACT YOU HAVE HAD
44200 WITH THE HELL'S ANGELS IN
44300 THE PAST?
44400 Pt. - I HAVE KNOWN SOME OF THEIR Answer hard to evaluate.He
44500 DEALERS AND PUSHERS may be telling the truth,
44600 it may be his fantasy. Maybe
44700 guy is in for drug addiction.
44800 {RESPONSE 6 PATIENT 9} Somewhat concrete, guarded,
44900 and frightened.
45000
45100 {4} Dr. - COULD YOU SAY A LITTLE
45200 MORE ABOUT THE CIRCUMSTANCES
45300 IN WHICH YOU HAVE KNOWN SOME
45400 OF THEIR DEALERS AND PUSHERS?
45500 Pt. - THEY WERE MEMBERS OF MY It doesn't really answer the
45600 COMMUNITY WHEN I GOT OUT question, a little on a tan-
45700 OF THE SERVICE THEY HAD gent unconnected to the
45800 BEEN MY FRIENDS FOR SO LONG information I am asking. Does
45900 not tell me very much. Again
46000 guarded response.
46100 {RESPONSE 6 PATIENT 8}
46200
46300 {5} Dr. - DID YOU DEAL WITH THEM
46400 YOURSELF AND HAVE YOU
46500 BEEN ON DRUGS OR NAR-
46600 COTICS EITHER NOW OR
46700 IN THE PAST?
46800 Pt. - YES I HAVE IN THE PAST To differentiate him from
46900 BEEN ON MARIHUANA REDS previous patient, at least
47000 BENNIES LSD there is a certain amount
47100 of appropriateness to the
47200 answer although it doesn't
47300 tell me much about what I
47400 {RESPONSE 3 PATIENT 7} asked at least it's not
47500 bizarre. If I had him in my
47600 office I would feel con-
47700 fident I could get more
47800 information if I didn't
47900 have to go through the
48000 teletype. He's a little more
48100 willing to talk than the
48200 previous person.Answer
48300 to the question is fairly
48400 appropriate though not
48500 extensive. Much less of a
48600 flavor of paranoia than
48700 any of previous responses.
48800
48900 {6} Dr. - COULD YOU TELL ME HOW
49000 LONG YOU HAVE BEEN IN THE
49100 HOSPITAL AND SOMETHING
49200 ABOUT THE CIRCUMSTANCES
49300 THAT BROUGHT YOU HERE?
49400 Pt. - CLOSE TO A YEAR AND Response somewhat appropriate
49500 PARANOIA BROUGHT ME but doesn't tell me much.
49600 HERE The fact that he uses the
49700 word paranoia in the way
49800 that he does without
49900 {RESPONSE 5 PATIENT 7} any other information,
50000 indicates maybe its a label
50100 he picked up on the ward
50200 or from his doctor.
50300 Lack of any kind of under-
50400 standing about himself.
50500 Dearth, lack of information.
50600 He's in some remission. Seems
50700 somewhat like a put-on. Seems
50800 he was paranoid and is in
50900 some remission at this time.
51000
51100 {7} Dr. - COULD YOU SAY SOMETHING
51200 NOW ABOUT YOUR PARANOID
51300 FEELINGS BOTH AT THE
51400 TIME OF ADMISSION AND
51500 DO YOU HAVE SIMILAR FEELINGS
51600 NOW AND IF SO HOW DO THEY
51700 AFFECT YOU?
51800 Pt. - AT THE TIME OF ADMISSION This response moves paranoia
51900 I THOUGHT THE MAFIA WAS back up. Stretching reality
52000 AFTER ME AND NOW ITS THE somewhat to think Hell's Angels
52100 HELL'S ANGELS are still interested in him.
52200 Somewhat bizarre in terms of
52300 content. Quite paranoid.
52400 {RESPONSE 8 PATIENT 9} Still paranoid. Gross and primitive
52500 responses.In middle of interview I
52600 felt patient was in touch but now
52700 responses have more concrete aspect.
52800
52900 {8} Dr. - DO YOU HAVE ANY THOUGHT
53000 AS TO WHY THESE TWO
53100 GROUPS WERE AFTER YOU?
53200 Pt. - BECAUSE I STOPPED SOME Response seems far fetched
53300 OF THEIR DRUG SUPPLY and hard to believe unless
53400 he was a narcotic agent which
53500 I doubt. Sounds somewhat
53600 {RESPONSE 9 PATIENT 9} grandiose, magical, paranoid
53700 flavor, in general indicates
53800 he's psychotic, paranoid
53900 schizophrenic with delusions
54000 about these two groups and
54100 I wouldn't rule out
54200 some hallucinations as well.
54300 Appropriateness of response
54400 answers question in concrete
54500 but unbelievable way.
54600
54700 6.5 ANALYSIS (1)
54800 Names of potential protocol judges (N=105) were selected from
54900 the 1970 American Psychiatric Association Directory using a table of
55000 random numbers. They were initially not informed that a computer was
55100 involved. (After the experiment, the participating judges (N=33)
55200 were fully informed as to its purpose and results.) The 105 names
55300 were divided into eight groups. Each member of a group was sent
55400 transcripts of three interviews along with a cover letter requesting
55500 his participation in the experiment. The interview transcripts
55600 consisted of:
55700 1) An interview conducted by one of the eight judges with the
55800 paranoid model,
55900 2) An interview conducted by the same interview judge with a
56000 human paranoid patient, and
56100 3) An interview conducted by a different psychiatrist with a
56200 human patient who was not clinically paranoid.
56300
56400 After each input-output pair in the transcripts there were two lines
56500 of rating numbers such that the protocol judges could circle numbers
56600 corresponding to their ratings of both the previous response of the
56700 patient, and an overall evaluation of the patient on the paranoid
56800 continuum. Thirty three protocol judges returned the rated protocols
56900 properly filled out and all were used in our data.
57000
57100 The interviews with nonparanoid patients were included to
57200 control for the hypothesis that any teletyped interview with a
57300 patient might be judged "paranoid". However, virtually all of the
57400 ratings of the nonparanoid interviews were zero for paranoia, Hence
57500 the hypothesis was falsified.
57600
57700
57800 The first index of indistnguishability between model and
57900 patient was simple: namely, the final overall rating given the
58000 patient and the model. The question was: "Which was rated as being
58100 more paranoid, the patient, the model, or neither?" (See Table 1).
58200 The protocol judges were more likely to distinquish the overall
58300 paranoid level of the model and the patient. In 37.5% of the paired
58400 interviews, the interview judges gave tied scores to the model and
58500 the patient as contrasted to only 9% of the protocol judges. Of the
58600 35 non-tied paired ratings, 15 rated the model as being more
58700 paranoid. If p is the theoretical probability of a judge judging the
58800 model more paranoid than a human paranoid patient, we find the 95%
58900 confidence interval for p to be 0.27 to 0.59. Since p=0.5 indicates
59000 indistinguishability of overall ratings of model and patient and our
59100 observed p=0.43, the results support the claim that the model is a
59200 successful simulation of a paranoid patient.
59300
59400 Separate analysis of the strong and weak versions of PARRY
59500 indicated that the strong version was indeed judged more paranoid
59600 than the paranoid patients, the weak version less paranoid. Thus a
59700 change in the parameter structure of the paranoid model produced a
59800 change along the dimension of paranoid behavior in the expected
59900 direction.
60000
60100 (TABLE 1
60200 Relative final overall ratings of paranoid model vs. paranoid
60300 patient indicating which was given highest overall rating of paranoia
60400 at end of interview.)
60500 (INSERT TABLE 1 HERE)
60600
60700
60800
60900
61000
61100
61200
61300
61400 6.6 ANALYSIS (2)
61500 The second index of indistinguihability is a more sensitive
61600 measure based on the two series of response ratings in the paired
61700 interviews. The statistic used is basically the standardized
61800 Mann-Whitney statistic (Siegel,1956).
61900 (INSERT EQUATION HERE)
62000
62100 where R is the sum of the ranks of the response ratings in the series
62200 of ratings given to the model, n the number of responses given by the
62300 model, and m the number of responses given by the patient. If the
62400 ratings given by a judge are randomly allocated to model and patient,
62500 i.e. model and patient are indistinguishable in response ratings, the
62600 expected value of Z is 0, with unit standard deviation. If higher
62700 ratings are more likely to be assigned to the model, Z is positive
62800 and conversely, negative values of Z indicate greater likelihood of
62900 assigning higher ratings to the patient. Each judge in evaluating a
63000 pair of interviews generates a single value of Z.
63100
63200 The overall mean of the Z scores was -0.044 with the standard
63300 deviation 1.68 (df=40). Thus the overall 95% confidence interval for
63400 the asymtotic mean value of Z is -0.485 to +0.573. The range of Z
63500 values is -3.8 to +4.46. The length of the confidence interval is a
63600 result of the large variance which itself is mainly related to the
63700 contrast between the weak and strong versions. (See TABLES 2 and 3).
63800 Once again the strong version of the model is more paranoid than the
63900 patients, the weak version less paranoid.
64000
64100 (INSERT TABLE 2)
64200 (SUMMARY STATISTICS OF Z RATINGS BY GROUP)
64300
64400
64500
64600
64700
64800
64900
65000
65100
65200 It is not surprising that results using the two indices of
65300 indistinguishability are parallel, since the indices are highly
65400 interrelated. The mean Z value for the 15 interviews on which the
65500 model was rated more paranoid was +1.28, on the 6 where model and
65600 patient tied: 0.41, on the 20 in which the patient was more paranoid:
65700 -0.993. A positive value of Z was observed when the patient was
65800 given an overall rating greater than the model 6 times; a negative
65900 value of Z when the model was rated more paranoid twice.
66000
66100 (INSERT TABLE 3)
66200 (Analysis of Variance of Z Ratings)
66300
66400
66500
66600
66700
66800
66900
67000
67100
67200
67300
67400
67500
67600 It is worth emphasizing that these tests invited refutation
67700 of the model. The experimental design of the tests put the model in
67800 jeopardy of falsification. If the paranoid model did not survive
67900 these tests, i.e. if it were not considered paranoid by expert
68000 judges and if there were no correlation between the weak-strong
68100 versions of the model and the severity ratings of the judges, then no
68200 claim regarding the success of the simulation could be made. Survival
68300 of potentially falsifying tests constitutes a validating step for a
68400 model.
68500
68600 6.7 ANALYSIS (3) THE MACHINE QUESTION
68700 For quite a long time people have wondered how to distinguish
68800 a man from an imitation of a man. The Greeks made statues so
68900 lifelike, it is said, they had to be chained down to keep them from
69000 walking away. To distinguish a man from a statue, Galileo suggested
69100 tickling each with a feather. To distinguish a man from a machine
69200 Descartes proposed conversational tests which the machine, lacking
69300 the ability to make appropriate replies, would fail. Turing's
69400 imitation games have been discussed on p.000. As heirs to this
69500 tradition, we perhaps inevitably became curious how judges using
69600 transcripts might answer the machine-question, i.e. which interviewee
69700 is a human and which is the computer model?
69800 To ask the machine-question, we sent interview transcripts,
69900 one with a patient and one with PARRY, to 100 psychiatrists randomly
70000 selected from the Directory of American Specialists and the Directory
70100 of the American Psychiatric Association. Of the 41 replies, 21 (51%)
70200 made the correct identification while 20 (49%) were wrong. Based on
70300 this random sample of 41 psychiatrists, the 95% confidence interval
70400 is between 35.9 and 66.5. The results indicate chance guessing.
70500 Psychiatrists are considered expert judges of patient
70600 interview behavior but they are unfamiliar with computers. Hence we
70700 conducted the same test with 100 computer scientists randomly
70800 selected from the membership list of the Association for Computing
70900 Machinery, ACM. Of the 67 replies 32 (48%) were right and 35 (52%)
71000 were wrong. Based on this random sample of 67 computer scientists the
71100 95% confidence interval ranges from 36 to 60. Again the results are
71200 close to a chance level.
71300 So both computer scientists and psychiatrists were unable, at
71400 better than a random guessing level, to distinguish transcripts of
71500 interviews with the model from transcripts of interviews with real
71600 patients.
71700 But what do we learn from asking the machine-question and
71800 finding that the distinction is not made? What we would most like to
71900 know is how to improve the model. Simulation models do not spring
72000 forth in a complete, perfect and final form; they must be gradually
72100 developed over time. Pehaps a correct model-patient distinction
72200 might be made if we allowed a large number of expert judges to
72300 conduct the interviews themselves rather than studying transcripts of
72400 other interviewers. This would indeed indicate that the model must
72500 be improved. But unless we systematically investigated how the judges
72600 succeeded in making the discrimination, we would not know what
72700 aspects of the model to work on. The logistics of such a design are
72800 immense, and obtaining a large number of judges for sound statistical
72900 inference would require an effort incommensurate with the information
73000 yielded.
73100
73200 6.8 ANALYSIS (4) MULTIDIMENSIONAL EVALUATION
73300 A more efficient and informative way to use Turing-like tests
73400 is to ask judges to make ratings along scaled dimensions from
73500 teletyped interviews. This might be called asking the "dimension
73600 question". One can then compare scaled ratings of the patients and
73700 the model in order to determine precisely where and by how much they
73800 differ. In constructing our model we strove for one which exhibited
73900 indistinguishability along some dimensions and distinguishability
74000 along others. That is, we wanted the model to converge on what it was
74100 intended to simulate and to diverge from that which it was not. Since
74200 a model represents a simplification nad a partial approximation, a
74300 perfect fit is not to be expected.
74400 Paired-interview transcripts were sent to another 400
74500 randomly selected psychiatrists asking them to rate the responses of
74600 the two `patients' along multiple dimensions. The judges were divided
74700 into groups, each judge being asked to rate responses of each I-O
74800 pair in the interviews along four dimensions. The total number of
74900 dimensions in this test was twelve: linguistic noncomprehension,
75000 thought disorder, organic brain syndrome, bizarreness, anger, fear,
75100 ideas of reference, delusions, mistrust, depression, suspiciousness
75200 and mania. These are dimensions which psychiatrists commonly use in
75300 evaluating patients. There were three groups of judges, each group
75400 being assigned 4 of the 12 dimensions.
75500
75600 (INSERT TABLE 4 HERE)
75700
75800 Table 4 shows there were significant differences, with PARRY
75900 receiving higher scores along the dimensions of linguistic
76000 noncomprehension, thought disorder, bizarreness, anger, mistrust and
76100 suspiciousness. On the delusion dimension the patients were rated
76200 significantly higher. There were no significant differences along
76300 the dimensions of organic brain syndrome, fear, ideas of reference,
76400 depression and mania.
76500 Whereas tests asking the machine-question indicate
76600 indistinguishability at the gross level, a study of the finer
76700 structure of the model's behavior through ratings along scaled
76800 dimensions shows significant differences between patients and model.
76900 These differences are of help to us in suggesting what areas of the
77000 model should be modified to improve its performance. The graph of
77100 Fig. 2 shows that no modifications are necessary along the dimension
77200 of "organic brain syndrome". But it is clear that PARRY'S
77300 language-comprehension might be improved. Then a future dimensional
77400 test would tell whether improvement had occurred and by how much.
77500 Successive identification of particular areas of failure provides a
77600 type of sensitivity analysis which makes clear what improvements
77700 should be pursued in developing more adequate model versions.
77800
77900 (INSERT FIG. 2 HERE)
78000
78100 6.5 ANALYSIS (5) A RANDOM MODEL
78200 Further evidence that the machine-question is too low a
78300 hurdle for a simulation model and too insensitive a test, comes from
78400 the following experiment. In this test we constructed a random
78500 version of the paranoid model (RANDOM-PARRY) which utilized PARRY'S
78600 output statements, but expressed them randomly independent of what
78700 the interviewer said. Two psychiatrists conducted interviews with
78800 this model, transcripts of which were paired with patient interviews
78900 and sent to 200 randomly selected psychiatrists asking both the
79000 machine-question and the dimension-question. Of the 69 replies to
79100 the machine question, 34 (49%) were right and 35 (51%) wrong. Based
79200 on this random sample of 69 psychiatrists, the 95% confidence
79300 interval ranges from 39 to 63, again indicating chance guessing. When
79400 a poor model, such as a random one, passes a test, it strongly
79500 suggests the test is weak.
79600
79700 (INSERT TABLE 5 HERE)
79800
79900 Although a distinction is not made when the simple machine-
80000 question is asked, definite distinctions ARE made when judgements are
80100 requested along specific dimensions. As shown in Table 5,
80200 significant differences appear along the dimensions of linguistic
80300 noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
80400 rated higher. On these particular dimensions we can construct a
80500 continuum in which the random version represents one extreme, the
80600 actual patients another. Nonrandom PARRY lies somewhere between these
80700 two extremes, indicating that it performs significantly better than
80800 the random version but still requires improvement before it can be
80900 considered indistinguishable from patients relative to these
81000 dimensions. Table 6 presents t values for differences between mean
81100 ratings of PARRY and RANDOM-PARRY. (See Table 6 and Fig.2 for the
81200 mean ratings).
81300
81400 (INSERT TABLE 6 AND FIG 2 HERE)
81500
81600 These studies show that a more useful way to use Turing-like
81700 indistinguishability tests is to ask expert judges to make ratings
81800 along multiple dimensions deemed essential to the model. Thus the
81900 model can serve as an instrument for its own perfection. A good
82000 validation procedure has criteria for better or worse approximations.
82100 Useful tests do not necessarily prove a model; they probe it for its
82200 strengths and weaknesses and clarify what is to be done next in the
82300 way of modification and repair. Simply asking the machine-question
82400 yields little information relevant to what the model builder most
82500 wants to know, namely, along which dimensions does the model need to
82600 be modified in order to effect an improvement in its performance?
82700
82800 To conclude, it is perhaps historically significant that
82900 these tests were conducted at all. To my knowledge, no one to date
83000 has subjected an interactive simulation model of human symbolic
83100 processes to multidimensional indistinguishability tests. These tests
83200 set a precedent and provide a standard against which competing models
83300 might be measured.